# High-Precision Visual Understanding
Pixtral 12b Quantized.w8a8
Apache-2.0
INT8 quantized version based on mgoin/pixtral-12b, supports vision-text multimodal tasks with optimized inference efficiency
Image-to-Text
Transformers English

P
RedHatAI
309
1
VARCO VISION 14B
VARCO-VISION-14B is a powerful English-Korean Vision-Language Model (VLM) that supports image and text input, generates text output, and possesses capabilities for grounding, referencing, and OCR.
Image-to-Text
Transformers Supports Multiple Languages

V
NCSOFT
1,022
28
Xgen Mm Phi3 Mini Instruct Interleave R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models (LMMs) developed by Salesforce AI Research, building upon the successful design of the BLIP series with foundational enhancements to ensure a more robust and superior model foundation.
Image-to-Text
Safetensors English
X
Salesforce
7,373
51
Florence 2 Large Ft Moredetailed
MIT
Fine-tuned on the imageinwords dataset based on the Florence-2-large-ft model, focusing on generating more detailed image descriptions
Image-to-Text
Transformers English

F
yayayaaa
227
13
Git Base Minecraft
MIT
This is a vision-based image-to-text model capable of generating image descriptions.
Image Generation
Transformers Supports Multiple Languages

G
orzhan
22
0
Featured Recommended AI Models